Using Speech in Visual Object Recognition
نویسندگان
چکیده
Automatic understanding of multi-modal input is the central topic in modern human computer interfaces. But the basic questions about how the interpretations provided by diierent modalities can be connected in a universal and robust manner is still an open problem. The most intuitive input modalities, speech perception and vision, can only be correlated on a qualitative content based interpretation level. But, due to vague meanings and erroneous processing results this is extremely diicult to accomplish. A simple frame based integration scheme lling appropriate slots with new analysis results will fail when ambiguous or contradictory information appears. In this paper we propose a new probabilistic framework to overcome these drawbacks. The integration model is built up from data collected in labeled test sets and psycholin-guistic experiments. Thereby, the correspondence problem is solved in a very robust and universal manner. In particular, we will show that erroneous visual interpretations can be corrected by a joint analysis of visual and speech input data.
منابع مشابه
Simulation of Position Based Visual Control and Performance Tests of 6R Robot
This paper presents simulation and experimental results of position-based visual servoing control process of a 6R robot using 2 fixed cameras. This method has the ability to deal with real time changes in the relative position of the target-object with respect to robot. Also, greater accuracy and independency of servo control structure from the target pose coordinates are the additional advanta...
متن کاملA hierarchical model for syllable recognition
Abstract. Inspired by recent findings on the similarities between the primary auditory and visual cortex we propose a neural network for speech recognition based on a hierarchical feedforward architecture for visual object recognition. When using a Gammatone filterbank for the spectral analysis the resulting spectrograms of syllables can be interpreted as images. After a preprocessing enhancing...
متن کاملOpen-Domain Audio-Visual Speech Recognition: A Deep Learning Approach
Automatic speech recognition (ASR) on video data naturally has access to two modalities: audio and video. In previous work, audio-visual ASR, which leverages visual features to help ASR, has been explored on restricted domains of videos. This paper aims to extend this idea to open-domain videos, for example videos uploaded to YouTube. We achieve this by adopting a unified deep learning approach...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000